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Abstract — A central problem of surveillance is to monitor 
multiple targets moving in a large-scale, obstacle-ridden envi- 
ronment with occlusions. This paper presents a novel principled 
Partially Observable Markov Decision Process-based approach 
to coordinating and controlling a network of active cameras for 
tracking and observing multiple mobile targets at high resolution 
in such surveillance environments. Our proposed approach is 
capable of (a) maintaining a belief over the targets' states (i.e., 
locations, directions, and velocities) to track them, even when 
they may not be observed directly by the cameras at all times, 
(b) coordinating the cameras' actions to simultaneously improve 
the belief over the targets' states and maximize the expected 
number of targets observed with a guaranteed resolution, and (c) 
exploiting the inherent structure of our surveillance problem to 
improve its scalability (i.e., linear time) in the number of targets 
to be observed. Quantitative comparisons with state-of-the-art 
multi-camera coordination and control techniques show that our 
approach can achieve higher surveillance quality in real time. 
The practical feasibility of our approach is also demonstrated 
using real AXIS 214 PTZ cameras. 

I. Introduction 

Monitoring, tracking, and observing multiple mobile targets 
in a large-scale, obstacle-ridden environment (e.g., airport 
terminals, railway stations, bus depots, shopping malls, etc.) 
is a central problem of surveillance. It is often necessary to 
acquire high-resolution videos/images of these targets. Tradi- 
tionally, such high-quality surveillance is achieved by placing 
a large number of static cameras to completely cover the 
large environment. This is impractical in terms of equipment, 
installation, and maintenance costs. Recent works ((6), l9ll , 
(Hi) have employed a heterogeneous network of wide- view 
static camera(s) to detect and track the targets within the 
environment at low resolution and some active pan/tilt/zoom 
(PTZ) cameras to be directed and focused on these targets to 
observe them at high resolution. Such surveillance systems 
face two serious practical limitations: (a) The obstacles in 
the environment (e.g., physical structures like walls, pillars, 
and barriers) are likely to occlude the fields of view (fov) 
of the static cameras and hence they cannot detect or track 
the targets that reside within these occluded regions. Since 
the surveillance system is not informed of these targets, the 
active cameras may not be directed to observe them; and (b) 
when the targets move further away from the low-resolution 
static cameras, their measured locations become less accurate 
regardless of the calibration method. The vision algorithms to 



detect and recognize the targets also grow less reliable. 

More importantly, the above limitations raise a practical 
implication affecting real-world multi-camera surveillance in 
general: The exact locations of the targets may not be observed 
directly by (or fully observable to) the cameras at all times. 
Such an environment is said to be partially observable 0. 
Instead of introducing additional static cameras to resolve 
this issue of partial observability, we propose an alternative 
that maintains a probabilistic belief over the targets' possible 
locations, directions, and velocities in the environment. Our 
proposed alternative offers a practical advantage over (151, 
0, lfT4l ) by eliminating the dependence on wide-fov static 
cameras to track the targets' locations and enabling the active 
PTZ cameras to perform dual roles of tracking the targets as 
well as observing them with high resolution. Hence, we will 
focus on using (though not limited ourselves to) only active 
PTZ cameras in this paper. 

This paper presents a novel principled decision-theoretic 
approach to coordinating and controlling a network of active 
cameras for tracking and observing multiple mobile targets at 
high resolution in uncertain, partially observable surveillance 
environments. Our proposed approach stems from framing the 
surveillance problem formally using a rich class of decision 
making under uncertainty models called the Partially Observ- 
able Markov Decision Process (POMDP) (Section [Hi]). Specif- 
ically, it overcomes the above limitations by (a) modeling a 
belief over the targets' states (i.e., locations, directions, and 
velocities) and updating the belief in a Bayesian paradigm 
(Section III-D| ) based on probabilistic models of the targets' 
motion (Section |III-B| ) and the active cameras' observations 
(Section |III-Q ; (b) coordinating the active cameras' actions 
to simultaneously improve the belief over the targets' states 
and maximize the expected number of targets observed with 



a guaranteed pre-defined resolution (Sections III-E and IV); 
and (c) exploiting the inherent structure of our surveillance 
problem to improve its scalability (i.e., linear time) in the 
number of targets to be observed (Section |TV|). Our POMDP- 
based approach is empirically evaluated in simulations over 
various realistic surveillance environments (Section |Y-A| ) and 
with real AXIS 214 PTZ cameras to demonstrate its practical 



feasibility (Section V-B ). 



II. Related Works 

As mentioned earlier, existing multi-camera coordination 
and control techniques have to operate in a fully observable 
surveillance environment where the locations, directions, and 
velocities of all the targets can be directly observed/estimated 
by either using additional low-resolution static cameras and 
sensors (0, (6), (9), fT4lL (T5)) or configuring one or more 
active cameras to zoom out to their wide view (Rl. ifTTIl . 
02), (T3)). They use these targets' information to predict 
their trajectories in order to schedule, coordinate, and control 
the network of active cameras to focus on and observe these 
targets at high resolution. 

The major drawbacks of these techniques are: (a) They 
cannot be deployed in real-world surveillance environments 
with occlusions. In this case, they cannot observe the targets 
that reside in the occluded regions, hence limiting the active 
cameras' full surveillance capability. In contrast, our approach 
does not assume that all targets can be fully observed at every 
time instance, and hence models a belief of the targets' states 
to keep track of them when they are not observed by any of 
the cameras; (b) since the resolution of the wide- view static 
cameras is low, they often produce inaccurate locations of the 
targets. This in turn induces errors in targets' directions and 
velocities which consequently affect the prediction capability 
of existing surveillance systems. On the other hand, our 
approach uses only active cameras to observe the targets 
at high resolution, thus allowing location errors to be kept 
minimal; and (c) many existing techniques have serious issues 
of scalability in the number of targets to be observed. Our 
approach extends our previous work [ 9 ] to achieve scalability 
in partially observable surveillance environments. 

III. System Overview and Problem Formulation 
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Fig. 1. POMDP controller for coordinating active cameras to perform high- 
quality surveillance in a partially observable environment. 

A POMDP controller models the interaction between the 
active cameras and the partially observable surveillance en- 
vironment. In particular, it is responsible for coordinating 
the cameras' actions to achieve a high-level surveillance goal 
which can be defined formally using a real- valued objective 



function and, in the context of this paper, is to maximize 
the number of targets observed with a guaranteed resolution 
(Section |III-E| ). By calibrating the active cameras, they can de- 
termine the locations of the targets observed in their fov, which 
are communicated to the controller. The targets are assumed 
to be non-evasive and hence their motion cannot be controlled 
or influenced by the cameras. The targets' correspondences 
across multiple cameras are resolved by distinct features like 
color and texture. 

Formally, a POMDP controller is defined as a tuple (<S, A, 
Z, Tf, Of, R) consisting of 

• a set S of joint states of active cameras and targets in the 
surveillance environment (Section [ill- A| ); 

• a set A of joint actions of active cameras (Section III-A[ ); 

• a set Z of joint observations of the targets taken by the 
cameras (Section |III-A| ); 

• a transition function Tf : S x A x S —> [0, 1] denoting the 
probability P(S'\S, A) of going from the current joint state 
S G S to the next joint state S f G S using the joint action 
A G A (Section |IirBj; 

• an observation function Of : S — » [0, 1] denoting the prob- 
ability P(Z\S) of observing the joint observation Z G Z 
given the joint state S G S (Section (III-Q ; and 

• a real- valued objective/reward function R : S —> M repre- 
senting a high-level surveillance goal (Section III-E| ). 

At any given time, the exact state of the environment is not 
fully observable to the POMDP controller. Instead, it maintains 



a belief B over the set S of all possible states (Section III-D ), 
that is, B(S) is the probability that the environment is in the 
state S G S such that ^2 SeS B(S) = 1. As shown in Fig.fT] at 
every time step, the POMDP controller issues an action A G 
A and makes an observation Z G Z from the environment. 
Based on the action A and observation Z, the prior belief B is 
updated by B ayes' rule to the posterior belief B' as follows: 



B'(S') = V P(Z\S') P(S'\S, A)B(S) 



(1) 



ses 



where r] = 1/P(Z\B, A) is a normalizing constant. A policy 
7r for the POMDP controller is defined as a mapping from 



each belief B to an action A (Section [IV]). Solving a POMDP 
involves choosing the optimal policy 7r* that maximizes the 
expected reward for any given belief B: 

tt*(B) =argmax V R(B')P(Z'\B, A) . 

AeA z , ez 

When the number of targets and active cameras increases, the 
state space and hence the belief space of the POMDP grow 



exponentially (Section |III-A| ). Therefore, computing the opti- 
mal policy incurs exponential time. Fortunately, by exploiting 



the structure of our surveillance problem (Sections |III-B| and 
|III-Q , the optimal policy for a given belief B can be computed 
efficiently (Section |Tv|). 



A. States, Actions, and Observations 

A joint state S G S of the POMDP controller is defined as 
a pair of joint states T G T m of m targets and C G C n of 



n active cameras where T and C denote sets of all possible 
states of each target and active camera, respectively. That is, 
S = (T,C) and 5 = T m xC n .LetT = (t l5 t 2 , • • • ,t m ) e T m 
and C = (ci, C2, . . . , c n ) G C n where G T and q G C 
denote the corresponding states of target fc and camera i. Let 
tk = (U h ,t dk ,t Vk ) eTixT d xT v where t h , t dfc , and t Vk denote 
target fc's location, direction, and velocity, respectively. That 
is, T =TixT d xT v . 

The state space C of an active camera is a finite set of 
discrete pan/tilt/zoom positions. Let fov(ci) C 71 be a subset 
of target locations lying within the fov of camera i in its state 
Ci. The joint fov of all cameras in joint state C is defined 
as fov(C) = Ur=i f ov ( c i)- The depth of fov of each active 
camera is limited such that (a) imageries of the targets detected 
within its fov satisfy a pre-defined resolution, and (b) the 
observed locations of the targets detected within its fov are 
of minimal location error. This is done by adjusting the zoom 
parameter of each camera based on its position. 

The joint actions of the POMDP controller are PTZ com- 
mands that move the corresponding cameras to their specified 
states. Let a joint action of the n cameras be denoted by 
A = (ai, a2, . . . , a n ) G A where a$ denotes the PTZ 
command of camera i. 

Let Z = Ti U {</>} denote a set of all possible observations 
of a target comprising the set T\ of all possible locations of the 
target in the environment and a null observation <fi when the 
target is not observed by any of the cameras. Let an observa- 
tion of target fc be denoted by z k G Z and a joint observation 
of the m targets be denoted by Z = (zi, Z2, . . . , z m ) G Z m . 
That is, Z = Z m . 



B. Transition Model Tf 

By exploiting the following structural assumptions in the 
state transition dynamics of the surveillance environment: 

• camera z's next state c- is conditionally independent of the 
other n — \ cameras' states and actions and the m targets' 
states given its current state q and action for i = 1, . . . , n 
and 

• target fc's next state t' k is conditionally independent of the 
n cameras' states and actions (i.e., target's motion is not 
affected by the cameras' states and actions) and the other 
m— 1 targets' states (i.e., every target moves independently) 
given its current state tk for fc = 1, . . . , m, 

the transition model Tf can be factored into transition models 
of individual targets and active cameras, hence significantly 
reducing the time incurred to compute the optimal policy 7r* 



P(S'\S,A) = l[P(t' k \t k )l[5 T 



(ci,CLi) 



(2) 



for a given belief B (Section IV). Furthermore, since the 
modern active cameras are able to move to their specified 
positions accurately 1 1 ], it is practical to assume the transition 
model of each individual camera to be deterministic and 
consequently represented by a function r that moves camera i 
from its current state c$ to its next state t(q, a$) by the action 
ai. Then, the transition model of the POMDP controller can 
be simplified to 



k=i 



where 5 x (x') is a Kronecker delta function of value 1 if 
x' = x, and otherwise. Details on the derivation of ^ are 
reported in [9 ]. The state transition of target fc from tk to t' k 
includes stochastic transitions of its location from t\ k to t[ , 
its direction from t dk to t' dk , and its velocity from t Vk to t' Vk . 
So, the transition probability of target fc can be factored into 
transition probabilities of its location, direction, and velocity: 

p(t' k \t k ) = p(^j^ fc ,4>4) p (4l^) p (CI^) • 

The transition probabilities P(t' dk \td k ) and P(t' Vk \t Vk ) of the 
target's direction and velocity are, respectively, modeled as 
Gaussian distributions N(iid,&d) an d Af(fi v ,cr v ) with the 
means fid and fi v being the current direction and velocity 
of the target, and ad and a v being the variance parameters 
which are learned from a dataset of the targets' trajectories in 
the environment. The transition probability P(t[ \U k , t' dk , t' Vk ) 
of the target's next location is constructed using the general 
velocity-direction motion model, as described in (5|. 

C. Observation Model Of 

Similar to the factorization of the transition model Tf, the 
observation model Of can also be factored into observation 
models of individual targets using the following structural 
assumption: The observed location z^ G Z of target fc is 
conditionally independent of the observed and true states of 
the other m — 1 targets and its true direction t dh G Td and 
velocity t Vk G T v given its true location t\ k G T\ and the joint 
state C G C n of the n active cameras for fc = 1, . . . , m. As 
a result, the time incurred to compute the optimal policy 7r* 



for a given belief B can be significantly reduced (Section [TV). 
Then, the observation model of the POMDP controller can be 
simplified to 



P(Z\S) = Y[P(z k \t lk ,C) 



(3) 



k=l 



The derivation of ^ is reported in Appendix [A] The observa- 
tion probability P(zk\ti k ,C) of target fc depends on whether 
the target lies within the joint fov of the active cameras. When 
the target lies within the cameras' joint fov corresponding to 
their joint state C (i.e., Zk ^ (/>), the observation model of 
target fc becomes deterministic: 



P(z k \ti k ,C) 



1 c 



if z k = ti k A U, 
otherwise. 



G fov(C), 



On the other hand, when target fc does not lie within the joint 
fov of the active cameras corresponding to their joint state C, 
the observation probability of target fc is uniformly distributed 
over the locations not covered by the joint fov (i.e., fov(C)): 



P(z k = 0\t lk ,C) 



\fov(C)\ 




if t lk <£fov(C), 
otherwise. 



D. Bayesian Belief Update 

By making use of independence assumptions similar to that 
in the transition model (Section |III-B| ), a belief B can be 
factored into beliefs of individual targets and cameras: 



B(S) 



P((T,C)) = P(T)P(C) 

m n m n 

n p(t k ) n = n w n ^ ^ (4) 



k=i 



k=l 



where b k denotes a belief over the set T of all possible states 
of target k (i.e., b k {t k ) is the probability that target k is in 
state t k ) and q is the current state of camera i that, unlike 
a target's state, is fully observable to the POMDP controller 
since its position can be directly read from its port. Hence, the 
probability P(ci) of a state q of camera i can be represented 
by a Kronecker delta S^(ci) and the last equality in ^ 
follows. 

The POMDP controller issues a joint action A to move each 
camera i from current state q to next state c-, receives an 
observation z k of each target k, and then updates the prior 
belief B to the posterior belief B' using B ayes' rule ([I]). 
Similar to the factorization of the prior belief B above, the 
posterior belief B' can also be factored into posterior beliefs 
of individual targets and cameras: 

m n 

S'(5') = n^(4)II^(c-) (5) 

k=l i=l 

where the posterior belief b' k of target k is defined as 

&'*(*'*) = VkP(z k \t' lk ,C) P(t' k \t k )b k (t k ) , (6) 
t k eT 

C = (ci, . . . , cf n ), and r\ k = l/P(z k \b kj C) is a normalizing 
constant. The derivation of ([5]) is reported in Appendix [B] 

Objective/Reward Function R 

The goal of the surveillance system is to maximize the 
number of targets observed with a guaranteed resolution. This 
can be achieved by defining a reward function that measures 
the total number of targets lying within the joint fov of the 
active cameras corresponding to their joint state C: 



R(S) = R((T,C))^J2 R ^^ C ^ 



(7) 



fe=i 



where 



R(t k ,C) 



1 if t lk € fov(C), 
otherwise. 



Since the exact locations of the targets may not be fully 
observable to the cameras at all times, the POMDP controller 
has to track the joint belief B of the targets and consider the 
expected reward with respect to this belief instead: 

m 

R(B) 4 R(S)B(S) = R(h: C) (8) 
ses k=i 



where C = (ci, . . . , c n ) and 

R(b k ,C)± J2 Rfa,C)h{t k ) . (9) 
t k eT 

The derivation of ([8} is reported in Appendix [C] 



IV. Policy Computation 

Recall that a policy 7r for the POMDP controller is a 
mapping from each belief B to a joint action A G A of the n 
cameras. At every time step, the POMDP controller determines 
the optimal policy 7r* for the belief B such that the expected 
number of observed targets in the next time step is maximized. 
Since the observations of the m targets taken by the cameras 
in the next time step are not known to the POMDP controller, 
it has to consider the expected reward with respect to these 
future observations. Then, the optimal policy 7r* for a given 
belief B becomes 



tt*(£) = argmaxV(£,A) 
AeA 



(10) 



where 



V(B, A)=J2 R(B')P(Z\B, A) . (11) 

zez 

Computing the policy 7r* ( [TQ| ) for a given belief B incurs 
0(|v4| |jj| m |T|) time which is exponential in the number m of 
targets. Fortunately, by exploiting simplified transition and ob- 
servation models due to conditional independence assumptions 
(i.e., ^ and ([3])), this computational cost can be significantly 
reduced. In particular, it is derived in Appendix [D] that the 
value function V(B, A) of m targets can be simplified to 
comprise a sum of value function V(b kl C) of individual 
target k for k = 1, . . . , m: 



V(B, A)=J2 V ^ ■ ■ ■ > r (^n, On))) d2) 

where 



k=i 



v{b k ,c)± T, R fo> c 'foVk) (13) 

z k efov(C)t' k eT 

and b' k is the unnormalized belief of target k (i.e., b' k (t' k ) = 
b f k (t f k )/r] k ). Using (12) and (13), we obtain the following 
result: 

Theorem 1: If ([5J and Q hold, then computing policy 7r* 



{10) for a given belief B incurs 0{\A\\Z\\T\m) time. 



Computing the value function V(b k ,C) (13) for a single 
target k incurs Q(\Z\\T\) time. For m targets, the value 



function V(B,A) (12) therefore incurs 0(\Z\\T\m) time. 



Finally, computing the optimal policy 7r* ( ^Q| for a given belief 
B incurs 0(\A\ \Z\ \T\m) time which is linear in number m 
of targets. 
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Fig. 2. Setups of simulated surveillance environments: (a) Hall (|7I| = 20 x 10 target locations), (b) corridor (\Ti\ 
junction (|7I| = 168 target locations) with its corresponding real-world map shown in (d). 



(d) 

40 x 5 target locations), and (c) 



V. Experiments and Discussion 

This section evaluates the performance of our proposed 
POMDP controller in simulations over various realistic 
surveillance environments using Player/Stage simulator 
and with real AXIS 214 PTZ cameras to show its practi- 
cal feasibility in real- world surveillance. Our POMDP-based 
approach (denoted by P in Fig. [5]) that uses only active 
PTZ cameras is compared against the following state-of-the- 
art multi-camera coordination and control techniques under 
partially observable surveillance environments: 

• MDP with only PTZ cameras (MP): This approach uses 
a Markov Decision Process (MDP) controller (5] to coor- 
dinate and control the active cameras. There is no static 
camera to directly observe the targets' locations. Hence, 
they are observed only from the active cameras' fov; 

• MDP with static and PTZ cameras (MSP): This approach 
uses the MDP controller [9] to coordinate and control 
the active cameras that are supported by wide-view static 
cameras. A Gaussian noise is added to the location of each 
target observed by the static camera such that the Gaussian 
variance increases with greater distance of the target from 
the static camera; 

• Systematic Approach (Sys): The active cameras are panned 
systematically to each of its states in a round robin fashion 
for every time step; and 

• Static Approach (Stat): The active cameras are fixed at a 
particular state such that they observe the maximum area of 
the environment. 

The performance metric used to evaluate the above approaches 
is given by 



PercentObs 



100 

rM toi 



where r (i.e., set to 100 in simulations) is the total number of 
time steps taken in our experiments, M l ohs is the total number 
of targets observed by the active cameras at a given time 
step i, and M tot is the total number of targets present in the 
environment. That is, the PercentObs metric averages the 
percentage of targets being observed by the active cameras 
over the entire duration of r time steps. 

A. Simulated Experiments 

Fig. [2] shows three different setups of simulated surveillance 
environments: (a) hall, (b) corridor, and (c) junction. The 




(a) (b) (c) 

Fig. 3. Graphs of PercentObs vs. number m of targets with n = 4 active 
cameras for the (a) corridor, (b) hall and (c) junction setups. 



junction setup simulates a surveillance environment within 
our university campus which consists of obstacles (black 
shades in Fig. |2j:) like buildings and walls. In order to 
introduce more occlusions in the environment, we have added 
a virtual pillar in the center of the junction setup (Fig. |2j:). 
The active cameras are simulated in Player/Stage simulator 
by configuring the states of the cameras across various pan 
angles, as discussed in Section 



III-A There are 



= 4 

active cameras with \C\ = 3 states each. The targets' trajec- 
tories are generated in the simulator based on the velocity- 
direction motion model (Section |III-B| ) which resembles real 
human motion in a surveillance environment. Every target 
can move in one of the 8 possible discretized directions 
% = {0 °, 45 °, . . . , -90 °, -45 °} with an assumed velocity 
of 1.5 cells per time step. The transition model, observation 
model, and reward function for a single target are computed 
and stored offline for the above setups. 

Fig. [3] shows the comparison of performance of different 
approaches for up to m — 20 targets for all three setups. It 
can be observed that our POMDP controller outperforms the 
other evaluated approaches in all three setups. The detailed 
observations from the experiments are as follows: 

Our POMDP controller performs better than the MP ap- 
proach because (a) when the targets leave the fov of any of 
the cameras and enter an occluded region, the active cameras 
in the MP approach have no idea where the targets will be 
moving to in the next few time steps, and (b) when the targets 
enter the fov of any of the active cameras from an occluded 
region, the directions of the targets are wrongly interpreted by 
the MDP controller. This is a serious limitation of the MP 
approach, i.e., there is no way of knowing the direction of 
the targets when they are in an occluded region. In contrast, 
the Bayesian belief update process in our POMDP controller 
helps to trace the locations and directions of the targets, even 



TABLE I 

Performance for real camera experiments. 



No. of targets m 


1 


2 


3 


4 


5 


PercentObs 


98.2 


96.6 


93.3 


91.5 


87 



when they are not observed in any of the cameras. Hence, the 
cameras are controlled based on the belief of the targets. 

Our POMDP controller performs better than the MSP 
approach because when the static cameras observe the targets 
that are far away, they obtain noisy locations of the targets. 
This in turn induces the errors in the direction and velocity of 
the targets. Hence, when the noisy targets' information is used 
in the MDP controller, it predicts the expected locations of the 
targets poorly, which consequently affects the performance of 
the MSP approach. In contrast, for our POMDP-based ap- 
proach, the targets' locations are observed by high-resolution 
active cameras whose calibration error is bounded by limiting 
the depth of its fov (Section III-A| ). Since the observations (i.e., 
locations of the targets) for POMDP are more accurate than in 
the MSP approach, the predictions of the targets' locations 
and directions through the Bayesian belief update process are 
also more accurate. 

Our POMDP controller performs much better than the Sys 
and Stat baseline approaches because, for our approach, the 
active cameras are controlled based on the targets' predicted 
motion and observations taken by the active cameras. But, for 
the Sys approach, the cameras are panned without accounting 
for the targets' information such as locations and direction 
while, for the Stat approach, every camera is fixed in one of 
the states. 

To summarize, our POMDP-based approach performs better 
than the MP approach due to its ability to keep track of the 
targets' locations and directions through its Bayesian belief 
update process. It outperforms the MSP approach because the 
observations (i.e., target's location) taken by the active cameras 
in our POMDP controller are more accurate as compared to the 
noisy observations taken by the static cameras in the MSP 
approach. Lastly, the Sys and Stat approaches suffer from 
performance degradation because the cameras are controlled 
independently of the targets' information. 

B. Real Camera Experiments 

The feasibility of our POMDP controller is tested using real 
AXIS 214 PTZ cameras to monitor Lego robots (targets) over 
the environment of size \Ti\ = 10 x 8 grid cells. We have 
n = 3 PTZ cameras, each of which has \C\ = 3 states. These 
cameras are calibrated in each of its state and the depth of 
the fov of these cameras are determined empirically for each 
of them. The Lego robots are programmed to move based 
on the velocity-direction motion model. Table [i] shows the 
performance of our approach in real camera experiments. Due 
to space limitation, we showcase our detailed results of real 
camera experiments in a demo videcQ 

The limitations of our POMDP-based approach are as 
follows: (a) it scales well only in number of targets and needs 
improvement in scalability in the number of cameras; and (b) it 



works well only if the underlying computer vision algorithms 
for target detection and recognition perform accurately. For 
our future work, we would like to extend this work by scaling 
to a large number of cameras and also accounting for the 
uncertainties arising from the underlying vision algorithms. 
We would also like to deploy active cameras along with a 
team of robots (Q, 0) for indoor surveillance. 

VI. Conclusion 

This paper describes a novel POMDP-based approach to 
coordinating and controlling a network of active cameras for 
maximizing the number of targets observed with a guaranteed 
resolution in an uncertain, partially observable surveillance 
environment. Specifically, our approach helps to eliminate the 
dependency on wide-view static cameras for tracking the tar- 
gets' locations and simultaneously performs the tracking and 
observation of the targets at high resolution. We have exploited 
the conditional independence property in the targets' motion 
and observation for our surveillance problem in order to reduce 
the exponential policy computation time to linear time in the 
number of targets. The experimental evaluation shows that our 
proposed POMDP controller performs better than the state-of- 
the-art approaches and is feasible and practical to operate in 
real- world environments. 
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Appendix 

A. Observation model factorization 

P(Z\S) 

= P(Z\T,C) 

= P(zi,Z2, • • • , Z m \ti : £25 • • • > ^m? C) 



= f[P(z k \t k ,C) 
k=i 

m 

= l[P(z k \t lk ,C) . 



k=l 



The last two equalities are due to the conditional independence 
assumption in the observation model (Section |III-Q . 



B. Posterior belief decomposition 



B'(S') 
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The first equality is due to ([I]). The second equality follows 
from pi, ([3]), and Q. The fifth equality follows from r\ 

n n^nw) (secti ° n 



tolii^ 



E). The last equality is due 



C. Reward function decomposition 
R{B) 

= Y,R{S)B{S) 

ses 

= J2 R((T,C))B((T,C)) 

(T,c)es 

m m n 

= e e E^^n^n^) 



cec n TeT m k=i 
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= E E R(tk,d)b k (t k ) 

k=l t k eT 

fc=l VtfcGT / 
m 

fc=i 

where T_/e = . . . , t k _ 1: t k+1 , . . . , t m ). The third equality 
is due to ([4]) and ([7]). The fifth equality follows from our 
independence assumption similar to that in ^ and the law 
of total probability: 

E U b ^= E p(T- k ) = i. 

Z). Vh/we function decomposition 
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where C' = (ci,...,# n ) and Z_ k = where rj' 1 = ^ P(z k \t' lk , C") ^ P(*' fc |* fc )M**) = 
(zi, . . . , ^-1,^+1, . . . , z m ). The first equality is due t' k eT t k eT 

to (TTTT). The second equality is obtained using d8l) and P(z k \b k ,C f ). The fourth equality follows from d2j), (pi, and 
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E). The fifth (4). It follows that rj = ^ n ^ II ^in- 
equality follows from P(Z_ k \B_ k ,A) = ]~[ P(zj\bj,C r ) 

n 

where B_ k (S) = bj(tj) and then the law of 

j/fc 2=1 

total probability: ^ P(zj\b j: C r ) = 1. Also, note 

that when z fc ^ fov(C'), R(b' k ,C') = 0. The sixth equality 
is due to ([9]). Since the normalizing constant of b' k (t' k ) is 
1/P(z k \b k , C), the seventh equality follows. 
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